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(57) Abstract 

A method of detecting I-frames in a video signal which has previously been MPEG coded, involves taking a DCT and analysing the 
frequency of zero value coefficients. An I-frame, which does not utilise prediction coding is expected to have a higher number of zero 
coefficients than a predicted P- or B- frame. 
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ANALYSIS OF COMPRESSION DECODED VIDEO, IMAGE SEQUENCES 

This invention relates to the analysis of compression decoded sequences 
and in the most important example to the analysis of video signals that have 
been decoded from an MPEG bitstream. 

It is now understood that processes such as re-encoding of a video signal 
can be significantly improved with knowledge of at least some of the coding 
decisions used in the original encoding. 

Proposals have been made for making some or all of these coding 
decisions available explicitly. Examples of these proposals can be seen in 
previous patent applications [see EP 0 765 576 and EP 0 913 058]. These 
methods involve the use of an Information Bus which passes MPEG coding 
parameters from a decoder to a subsequent re-coder. 

However, in certain situations, no such explicit information is available; 
and the only available information is that contained in the decoded video signal. 

It will be well understood that in the MPEG-2 video compression standard, 
there are different categories of frames which differ in the degree to which their 
frames are coded using prediction, and that these categories are denoted by 
I-, P- and B-frames respectively. An important coding decision to be taken into 
consideration in a re-encoding process is accordingly the frame structure in 
terms of the I-, P- and B-frames. 

Accordingly, it is an object of the invention to determine by analysis of 
the video signal, information concerning the upstream coding and decoding 
process that is useful in minimizing degradation of picture quality in a 
subsequent, downstream coding and decoding process. 

A further object of the present invention is to derive information from a 
decoded video signal concerning the categories of frames employed in the 
encoding process. It is a further object of this invention to assist in maintaining 
picture quality when cascading compression decoding and coding processes. 

Accordingly the present invention consists in a method of analysing a 
signal derived in coding and decoding processes which utilise a quantisation 
process having a set of quantisation values in which the coded signal contains 



WO 00/22831 



PCT/GB99/03359 



categories of frames which categories differ in the degree to which their frames 
are coded using prediction, the method comprising the steps of measuring the 
occurrence of values in the signal corresponding with the set of possible 
quantisation values, and inferring the category of a specific frame by testing the 
5 occurrence of said values against a threshold. 

The invention will now be described by way of example in reference to 
the accompanying drawings, in which: 

Figure 1 is a block diagram illustrating the use of an Information Bus as in 
the prior art; 

10 Figure 2 is a block diagram showing one embodiment of the present 

invention; 

Figure 3 is a graph illustrating the operation of one embodiment of the 
present invention; and 

Figure 4 is a block diagram illustrating in more detail a specific part of the 
15 apparatus of the Figure 2 embodiment. 

Referring initially to Figure 1, the MPEG decoder (100), adapted as shown 
in the above prior references, receives an MPEG bitstream. In addition to the 
output of a standard MPEG decoder, this adapted decoder then produces an 
Information Bus output conveying the coding decisions taken in the upstream 
20 encoder, which are of course inherent in the MPEG bitstream. The Information 
Bus is then passed to the dumb coder (102) along with the video signal. This 
dumb coder then follows the coding decisions made by the upstream coder (not 
shown) which are conveyed by the Information Bus. 

This invention is related to the situation in which the adapted MPEG 
25 decoder cannot be used to produce the Information Bus because the MPEG 
bitstream has already been decoded by a standard decoder and the input 
bitstream is no longer available. The aim is to try to estimate as many as 
possible of the MPEG coding parameters by analysing the decoded video signal. 
When the Information Bus is used, transparent cascading is only possible 
30 when all the relevant parameters at sequence, GOP, picture, slice and 

macroblock rate are carried. Clearly, it is not possible to estimate all these 
parameters from a decoded picture alone. However, a proportion of the benefit 
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of the Information Bus can be obtained even if the only information carried from 
decoder to coder relates to the picture type (I-, P- or B-frame). In this case, the 
'dumb' coder of Figure 1 becomes a full MPEG coder except that it sets the 
picture type to that received in the Information Bus. 
5 The purpose of the present invention is to estimate the picture type by 

analysing the decoded video signal. Such an estimate would be used as shown 
in Figure 2. In this embodiment, the standard MPEG decoder (200) receives the 
bitstream input, and decodes it, producing the usual video signal output. This is 
then passed to the MPEG coder (206), and also to the picture type detector 

10 (202). The information from the picture type detector is then passed to an 

Information Bus generator (204). The Information Bus then generated, relating to 
picture type only, is then passed to the MPEG coder. This then follows the 
coding information available in the Information Bus in coding the video signal, 
producing an output bitstream. 

15 The following description concentrates on one particular aspect of the 

picture type detector, the detection of l-frames. In this embodiment, the coding 
processes referred to are those of MPEG-2, in which the different categories of 
frames which differ in the degree to which they are coded using prediction are 
the picture types I, P, and B. The l-frames are coded with no prediction; the 

20 P-frames with only forward prediction and the B-frames with both forward and 
backward prediction. 

The invention relies on the observation that intra coded blocks in MPEG-2 
are the direct output of an inverse DCT function, whereas predicted macroblocks 
are the result of an inverse DCT function added to a prediction. If we take the 

25 forward DCT of a picture that has been decoded from an MPEG-2 bitstream, 
then we would expect the DCT coefficients of intra blocks to take only values 
that were in the set of quantizer reconstruction levels specified in the MPEG-2 
standard. The DCT coefficients of predicted blocks might occasionally exhibit 
this property, but this would only be fortuitous. Unfortunately, without 

30 knowledge of the quantizer step size and weighting matrix used in the original 
encoder, the only quantizer reconstruction level that we know to exist is zero. 
However, because the distribution of DCT coefficients (even for intra blocks) is 
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highly peaked around zero, we can still expect a large number of DCT 
coefficients of intra blocks to be equal to zero. 

If we count the number of zero DCT coefficients in each frame, we would 
expect a high number to indicate either that the frame is an l-frame and contains 
5 only intra blocks, or is a predicted frame in which a very high proportion of the 
blocks are intra coded. In either case, it would be acceptable to judge such a 
frame as an l-frame for the purposes of optimizing the performance of the re- 
coding step. One slight complication is that, for luminance blocks, there are two 
options for every macroblock for the input to the DCT process; dct_type can be 

10 either frame based or field based. This problem can be avoided by simply taking 
both kinds of DCT in parallel and including both in the count, accepting the fact 
that there will be a 2:1 'dilution' of the result. While chrominance blocks do not 
have this problem in Main Profile MPEG coding, they are not used in the 
preferred form of the invention because of the ambiguities involved in 

15 transcoding between the 4:2:0 and 4:2:2 formats. 

Figure 3 illustrates aspects of this particular embodiment of the invention. 
Graph B in Figure 3 shows the number of zero coefficients encountered for each 
frame in a typical sequence in which the first and every 1 2th subsequent frame 
is an l-frame. 

20 Clear peaks can be seen which indeed correspond to the existence of 

l-frames. In principle, we should be able to apply a threshold to the curve in 
order to detect l-frames. However, as the graph illustrates, a simple fixed 
threshold will not always work. The sequence has a change in scene content 
around frame 1 90, and this leads to a drop both in the counts of zero 

25 coefficients for l-frames and in the 'background count' for other frames. Some 
kind of adaptive threshold filter is therefore required. There now follows an 
example showing how the threshold could be adapted. 

The filter sets an initial threshold which is then modified once per frame. 
If the zero-coefficent count (ZCC) exceeds the threshold, then an l-frame is 

30 deemed to have been detected and the threshold is reset to the ZCC value; 
otherwise the threshold is decreased by a factor, known here as the Threshold 
Modifier Factor (TMF): 
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25 



_ {T^*1MF+V m ) 
TMF+l 

where T a is the threshold at frame n , and V R is the ZCC value at time n . 

It can be seen that if the incoming ZCC data is constant, then the 
threshold will tend to that constant. 



The TMF is determined using a confidence test. The confidence test looks 
10 at the position of the last l-frame found and the number of frames between the 
last two detected l-frames, and assumes that the l-frames are occurring at 
regular intervals. If there were no frames between the last two detected 
l-frames, then we assume that a false l-frame was detected, so the confidence 
is set to 0.5 until another l-frame is found; otherwise the confidence is 
15 determined as: 

( (CurrentFramePosition - ExpectedlFramePosition^ 
Confidence = expl - 

The confidence is clipped to the range 0 (no peak expected) to 1 (peak 
20 expected). 

The TMF is determined as : 



TMF = Max - Confidence * {Max - Min) 



where Min and Max are the minimum and maximum values the TMF is allowed 
to take. 

It can be seen that the greater the confidence, the smaller the TMF, hence 
the more rapidly the threshold decreases until an l-frame is detected. 
30 The above description is simply an example; other systems of adaptive 

thresholding could equally be used. 

A refinement to the detection method could be to make an explicit and 
separate detection of scene changes and incorporate this into the confidence 
measure. This could be augmented by a priori knowledge of the strategy 
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adopted by the original encoder to modify or reset the GOP structure at scene 
changes. 

The performance of the complete detection method is illustrated in Figure 
3 , where the four curves show: the threshold, graph A, the zero-coefficient 
5 count (ZCC) value, graph B, the confidence value, graph C, and the result - a 
downward pointing spike indicating that an l-frame has been detected - 
graph D. 

A further particular aspect of the picture type detector involves the 
detection of all the different categories of frames which make up the frame 

10 structure in the video signal and not merely the l-frames. The method here is 
similar to that employed in detecting the l-frames. In general, a particular 
P- frame is not necessarily noticeably more intra-coded than a particular 
B-frame, but if the analysis is performed as for the detection of l-frames, and an 
average of the results over a large number of frames is taken, it is found that the 

15 P-frames are, on average, slightly more intra-coded than the B-frames. This 

result can be used to determine the particular frame structure which was used in 
the original encoding. 

Given a specific number of non-l-frames between two l-frames, there are 
only a few different types of frame structure which are used in conventional 

20 encoders. For example, given an l-frame every sixth frame, the structure will 

typically be I, B, B, P, B, B, I or I, B, P, B, P, B, I. The changes, if at all, between 
frames structures in a given signal are also reasonably infrequent, allowing a 
large possible sample from which an average can be taken. If an average of the 
results of an analysis as above are taken, the subtle trends in the numbers of 

25 zero coefficients found for the different types of predicted frame can be used, 
along with the knowledge that there are only a certain known number of 
different frame structures available, to deduce the frame structure which was 
used in encoding the given signal. This information can be used alongside 
information regarding the position of the l-frames to give more detailed 

30 information to the MPEG coder (as in Figure 2, (206)) via the Information Bus. 

Figure 4 gives a block diagram of an l-frame detector using the method 
described above. The video signal is first converted to field blocks (400) and 
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frame blocks (401). The signal from each of these is then passed first through a 
DCT (402), and the zero coefficients for the signal are then counted (404). The 
two signals are then added and the combined zero coefficient count is compared 
at 408 with a threshold. An l-frame is detected if the threshold is exceeded. 
5 The threshold is calculated at 406 utilising, as described above, information from 
the combined zero coefficient count and the location of the last detected 
l-frame. 

It will be understood that this invention has been described by way of 
example only, and a wide variety of modification is possible without departing 

10 from the scope of the invention. 

Thus, whilst the described counting of zero coefficients has the advantage 
of not requiring prior knowledge of the full set of possible quantisation values, 
there will be circumstances in which it will be appropriate to measure the 
occurrence of other values. Also, although the example of MPEG-2 is of course 

15 very important, the invention is also applicable to other coding schemes which 
utilise categories of frames which differ in the degree to which frames are coded 
using prediction. 
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CLAIMS 

1 . A method of analysing a signal derived in coding and decoding 
processes which utilise a quantisation process having a set of 
quantisation values and in which the coded signal contains 
categories of frames which categories differ in the degree to which 
frames are coded using prediction, the method comprising the steps 
of measuring the occurrence of values in the signal corresponding 
with the set of possible quantisation values, and inferring the 
category of a specific frame by testing the occurrence of said 
values against a threshold. 

2. A method according to Claim 1, in which said threshold is varied in 
accordance with the expected pattern of said categories of frames 
in the coded signal. 

3. A method according to Claim 2, in which said threshold varies with 
the number of frames since detection in the decoded signal of a 
particular category of frame. 

4. A method according to Claim 3, in which said particular category of 
frame contains those frames which are coded with no prediction. 

5. A method according to any one of the preceding claims, in which 
the occurrence of zero values is measured. 

6. A method according to any one of the preceding claims, in which 
said coding and decoding processes are those employed in MPEG-2, 
and in which said categories of frames are denoted I-, P- and B- 
frames respectively. 
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